AITopics | gradient td method

Collaborating Authors

gradient td method

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Reviews: Finite sample analysis of the GTD Policy Evaluation Algorithms in Markov Setting

Neural Information Processing SystemsOct-7-2024, 17:43:31 GMT

It is well known that the standard TD algorithm widely used in reinforcement learning does not correspond to the gradient of any objective function, and consequently is unstable when combined with any type of function approximation. Despite the success of methods like deep RL, which combines vanilla TD with deep learning, theoretically TD with nonlinear function approximation is demonstrably unstable. Much work on fixing this fundamental flaw in RL has been in vain, till the work on gradient TD methods by Sutton et al. Unfortunately, these methods work, but their analysis was flawed, based on a heuristic derivation of the method. A recent breakthrough by Liu et al. (UAI 2015) showed that gradient TD methods are essentially saddle point methods that are pure gradient methods that optimize not the original gradient TD loss function (which they do not), but rather the saddle point loss function that arises when converting the original loss function into the dual space.

function approximation, gradient td method, sample analysis, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Investigating practical linear temporal difference learning

White, Adam, White, Martha

arXiv.org Machine LearningMar-30-2016

Off-policy reinforcement learning has many applications including: learning from demonstration, learning multiple goal seeking policies in parallel, and representing predictive knowledge. Recently there has been an proliferation of new policy-evaluation algorithms that fill a longstanding algorithmic void in reinforcement learning: combining robustness to off-policy sampling, function approximation, linear complexity, and temporal difference (TD) updates. This paper contains two main contributions. First, we derive two new hybrid TD policy-evaluation algorithms, which fill a gap in this collection of algorithms. Second, we perform an empirical comparison to elicit which of these new linear TD methods should be preferred in different situations, and make concrete suggestions about practical use.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

1602.08771

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback